In [1]:
import warnings
warnings.filterwarnings("ignore")
10X Visium Spatial Transcriptomics: Preprocessing and Quality Control¶
This notebook performs initial loading and quality control (QC) for multiple 10X Visium spatial transcriptomics samples.
Workflow:
- Setup: Import libraries and configure paths and parameters.
- Loop through samples:
a. Load sample data (
filtered_feature_bc_matrix.h5and spatial information) usingscanpy.read_visium. b. Ensure gene names (var_names) are unique. c. Verify spatial information loaded correctly. d. Perform Quality Control: i. Calculate QC metrics (total counts, genes per spot, mitochondrial percentage). ii. Generate and save pre-filtering QC plots (violin, scatter). iii. Determine dynamic filtering thresholds using Median Absolute Deviation (MAD). iv. Filter low-quality spots based on thresholds. v. Generate and save post-filtering QC plots (violin, spatial map showing filtered spots). e. Save the processed AnnData object (.h5ad) for the sample. - Conclusion: Summarize outputs.
1. Setup: Imports and Configuration¶
In [2]:
import scanpy as sc
import squidpy as sq # Often used alongside scanpy for spatial analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
import copy
import traceback
import warnings
import os
from scipy import stats # For Median Absolute Deviation (MAD)
Configuration Parameters¶
Set the paths to your data, define sample IDs, and specify QC parameters.
In [3]:
data_dir = Path("./GSE283269_RAW/")
sample_ids = ["Autopsy1", "Autopsy2", "Autopsy3", "Explanted1", "Explanted2", "Explanted3"] # Your sample IDs
count_file_name = "filtered_feature_bc_matrix.h5"
spatial_folder_name = "spatial"
tissue_positions_file_name = "tissue_positions.csv"
scalefactors_file_name = "scalefactors_json.json"
hires_image_name = "tissue_hires_image.png"
lowres_image_name = "tissue_lowres_image.png"
output_adata_dir = Path("./101_Preprocess_processed_adata")
qc_figures_dir = Path("./101_Preprocess_qc_figures")
output_adata_dir.mkdir(parents=True, exist_ok=True)
qc_figures_dir.mkdir(parents=True, exist_ok=True)
mt_gene_prefix = 'MT-'
mad_threshold_n_genes = 3.0
mad_threshold_total_counts = 3.0
mad_threshold_mt_pct = 3.0
min_gene_threshold = 50
min_count_threshold = 100
max_mt_threshold = 25.0
2. Processing Loop: Load, QC, Save¶
Iterate through each sample_id defined in the configuration.
In [4]:
adatas_processed = {} # Store processed adatas if needed later
individual_uns_spatial = {} # Store the .uns['spatial'] separately (as before)
print("Loading individual samples, applying QC, and saving...")
for sample_id in sample_ids:
print(f"--- Processing {sample_id} ---")
sample_path = data_dir / sample_id
# Using the specified file/folder names from config
count_file_path = sample_path / count_file_name
spatial_path = sample_path / spatial_folder_name
tissue_positions_path = spatial_path / tissue_positions_file_name
scalefactors_path = spatial_path / scalefactors_file_name
# --- File Checks ---
if not (sample_path.is_dir() and count_file_path.is_file() and spatial_path.is_dir() and
tissue_positions_path.is_file() and scalefactors_path.is_file()):
print(f" ERROR: One or more essential files missing for {sample_id}. Check paths/names:")
print(f" - Sample Path: {sample_path} (Exists: {sample_path.is_dir()})")
print(f" - Count File: {count_file_path} (Exists: {count_file_path.is_file()})")
print(f" - Spatial Folder: {spatial_path} (Exists: {spatial_path.is_dir()})")
print(f" - Tissue Positions: {tissue_positions_path} (Exists: {tissue_positions_path.is_file()})")
print(f" - Scalefactors: {scalefactors_path} (Exists: {scalefactors_path.is_file()})")
print(" Skipping.")
continue
try:
print(f" 1. Loading {sample_id} using scanpy.read_visium...")
# Suppress FutureWarning temporarily if desired
with warnings.catch_warnings():
warnings.filterwarnings("ignore", message=".*squidpy.*", category=FutureWarning)
warnings.filterwarnings("ignore", message="Variable names are not unique.", category=UserWarning)
adata_sample = sc.read_visium(
path=sample_path,
count_file=count_file_path.name, # Pass only filename relative to path
source_image_path=spatial_path / "tissue_hires_image.png", # Assuming hires image name
library_id=sample_id, # Important for uns structure
load_images=True,
)
print(f" Initial load: {adata_sample.n_obs} spots, {adata_sample.n_vars} genes")
# --- Make var_names unique ---
if not adata_sample.var_names.is_unique:
print(" 2. Making var_names unique...")
adata_sample.var_names_make_unique()
else:
print(" 2. Var_names are already unique.")
# --- Verify loading & Store .uns['spatial'] ---
# (Your original verification logic is good, adapted slightly)
print(f" 3. Verifying spatial metadata for library_id '{sample_id}'...")
obsm_ok = 'spatial' in adata_sample.obsm
uns_spatial_content = None
uns_ok = False
if 'spatial' in adata_sample.uns and isinstance(adata_sample.uns['spatial'], dict):
if sample_id in adata_sample.uns['spatial']:
uns_spatial_content = adata_sample.uns['spatial'][sample_id]
if isinstance(uns_spatial_content, dict) and \
'images' in uns_spatial_content and \
'scalefactors' in uns_spatial_content:
uns_ok = True
print(f" Metadata loaded and verified in .uns['spatial']['{sample_id}']: True")
# Store a DEEP COPY for later manual merging if needed
individual_uns_spatial[sample_id] = copy.deepcopy(uns_spatial_content)
else:
print(f" WARNING: Metadata structure incorrect within .uns['spatial']['{sample_id}']. Content: {uns_spatial_content}")
else:
print(f" ERROR: Library ID '{sample_id}' NOT FOUND as key within .uns['spatial']. Keys found: {list(adata_sample.uns['spatial'].keys())}")
else:
print(f" ERROR: Metadata key 'spatial' NOT FOUND in .uns or is not a dictionary.")
print(f" Coordinates loaded (.obsm['spatial']): {obsm_ok}")
if not (obsm_ok and uns_ok):
print(f" ERROR: Failed to load spatial coordinates or full/correct metadata for {sample_id}. Skipping QC and saving.")
continue
# --- <<< START QUALITY CONTROL >>> ---
print(f" 4. Performing Quality Control for {sample_id}...")
# 4a. Calculate Mitochondrial Gene Percentage
print(f" Calculating MT percentage using prefix '{mt_gene_prefix}'...")
adata_sample.var['mt'] = adata_sample.var_names.str.startswith(mt_gene_prefix)
sc.pp.calculate_qc_metrics(adata_sample, qc_vars=['mt'], percent_top=None, log1p=False, inplace=True)
n_mt_genes = adata_sample.var['mt'].sum()
if n_mt_genes == 0:
print(f" WARNING: No mitochondrial genes found with prefix '{mt_gene_prefix}'. Check the prefix or gene names. MT% filtering will be skipped.")
# Handle case where no MT genes are found - maybe skip MT filtering or use a dummy value
adata_sample.obs['pct_counts_mt'] = 0.0 # Assign 0 if no MT genes found
else:
print(f" Found {n_mt_genes} mitochondrial genes.")
# 4b. Generate Pre-filtering QC Plots
print(" Generating pre-filtering QC plots...")
fig, axs = plt.subplots(1, 3, figsize=(15, 4))
sc.pl.violin(adata_sample, ['n_genes_by_counts'], jitter=0.4, ax=axs[0], show=False)
sc.pl.violin(adata_sample, ['total_counts'], jitter=0.4, ax=axs[1], show=False)
sc.pl.violin(adata_sample, ['pct_counts_mt'], jitter=0.4, ax=axs[2], show=False)
fig.suptitle(f'QC Metrics Before Filtering - {sample_id}')
plt.tight_layout(rect=[0, 0, 1, 0.96]) # Adjust layout to prevent title overlap
plot_path = qc_figures_dir / f"{sample_id}_qc_violin_before_filtering.png"
plt.show()
plt.savefig(plot_path)
print(f" Saved pre-filter violin plot: {plot_path}")
plt.close(fig) # Close figure to free memory
# Scatter plots for relationships
fig, axs = plt.subplots(1, 2, figsize=(12, 5))
sc.pl.scatter(adata_sample, x='total_counts', y='pct_counts_mt', ax=axs[0], show=False)
sc.pl.scatter(adata_sample, x='total_counts', y='n_genes_by_counts', ax=axs[1], show=False)
fig.suptitle(f'QC Scatter Plots Before Filtering - {sample_id}')
plt.tight_layout(rect=[0, 0, 1, 0.96])
plot_path = qc_figures_dir / f"{sample_id}_qc_scatter_before_filtering.png"
plt.show()
plt.savefig(plot_path)
print(f" Saved pre-filter scatter plot: {plot_path}")
plt.close(fig)
# 4c. Determine Dynamic Thresholds using MAD
print(" Determining dynamic filtering thresholds...")
# Function to calculate MAD-based threshold (lower bound)
def get_lower_bound(metric_data, n_mad, fallback_min):
median_val = np.nanmedian(metric_data)
mad_val = stats.median_abs_deviation(metric_data, nan_policy='omit')
threshold = median_val - n_mad * mad_val
print(f" - Median: {median_val:.2f}, MAD: {mad_val:.2f}, Raw Lower Threshold ({n_mad} MAD): {threshold:.2f}")
# Ensure threshold is reasonable and above absolute minimum
threshold = max(threshold, fallback_min)
# Ensure threshold is not negative
threshold = max(threshold, 0)
print(f" - Final Lower Threshold (>= {fallback_min}, >= 0): {threshold:.2f}")
return threshold
# Function to calculate MAD-based threshold (upper bound)
def get_upper_bound(metric_data, n_mad, fallback_max):
median_val = np.nanmedian(metric_data)
mad_val = stats.median_abs_deviation(metric_data, nan_policy='omit')
threshold = median_val + n_mad * mad_val
print(f" - Median: {median_val:.2f}, MAD: {mad_val:.2f}, Raw Upper Threshold ({n_mad} MAD): {threshold:.2f}")
# Ensure threshold is reasonable and below absolute maximum
threshold = min(threshold, fallback_max)
# Ensure threshold is not > 100 for percentages
if fallback_max <= 100:
threshold = min(threshold, 100.0)
print(f" - Final Upper Threshold (<= {fallback_max}, <= 100): {threshold:.2f}")
# Handle case where MAD is 0 (all values are the same) - use fallback directly
if mad_val == 0:
print(f" - MAD is 0, using fallback max: {fallback_max}")
return fallback_max
return threshold
print(" Calculating threshold for n_genes_by_counts:")
thresh_n_genes_lower = get_lower_bound(adata_sample.obs['n_genes_by_counts'], mad_threshold_n_genes, min_gene_threshold)
print(" Calculating threshold for total_counts:")
thresh_total_counts_lower = get_lower_bound(adata_sample.obs['total_counts'], mad_threshold_total_counts, min_count_threshold)
thresh_mt_upper = max_mt_threshold # Default to absolute max
if n_mt_genes > 0 : # Only calculate dynamic threshold if MT genes exist
print(" Calculating threshold for pct_counts_mt:")
thresh_mt_upper = get_upper_bound(adata_sample.obs['pct_counts_mt'], mad_threshold_mt_pct, max_mt_threshold)
else:
print(f" Skipping dynamic MT% threshold calculation (no MT genes found). Using fixed max: {max_mt_threshold}")
# 4d. Filter Spots (Cells/Observations)
n_obs_before = adata_sample.n_obs
print(f" Applying filters: n_genes_by_counts >= {thresh_n_genes_lower}, total_counts >= {thresh_total_counts_lower}, pct_counts_mt <= {thresh_mt_upper}")
# Create boolean masks for filtering
obs_mask_genes = adata_sample.obs['n_genes_by_counts'] >= thresh_n_genes_lower
obs_mask_counts = adata_sample.obs['total_counts'] >= thresh_total_counts_lower
obs_mask_mt = adata_sample.obs['pct_counts_mt'] <= thresh_mt_upper
# Combine masks
adata_sample.obs['pass_qc'] = obs_mask_genes & obs_mask_counts & obs_mask_mt
# Apply the filter by creating a new view/copy
adata_filtered = adata_sample[adata_sample.obs['pass_qc']].copy() # Use .copy() crucial here!
n_obs_after = adata_filtered.n_obs
n_removed = n_obs_before - n_obs_after
percent_removed = (n_removed / n_obs_before * 100) if n_obs_before > 0 else 0
print(f" Filtering removed {n_removed} spots ({percent_removed:.2f}%). Kept {n_obs_after} spots.")
# (Optional) Filter Mitochondrial Genes (Variables)
# Usually done *after* calculating MT% but before downstream analysis like normalization
n_vars_before = adata_filtered.n_vars
adata_filtered = adata_filtered[:, ~adata_filtered.var['mt']].copy() # Remove MT genes
n_vars_after = adata_filtered.n_vars
print(f" (Optional) Removed {n_vars_before - n_vars_after} mitochondrial genes.")
# 4e. Generate Post-filtering QC Plots (Optional but Recommended)
print(" Generating post-filtering QC plots...")
fig, axs = plt.subplots(1, 3, figsize=(15, 4))
sc.pl.violin(adata_filtered, ['n_genes_by_counts'], jitter=0.4, ax=axs[0], show=False)
sc.pl.violin(adata_filtered, ['total_counts'], jitter=0.4, ax=axs[1], show=False)
sc.pl.violin(adata_filtered, ['pct_counts_mt'], jitter=0.4, ax=axs[2], show=False)
fig.suptitle(f'QC Metrics After Filtering - {sample_id}')
plt.tight_layout(rect=[0, 0, 1, 0.96])
plot_path = qc_figures_dir / f"{sample_id}_qc_violin_after_filtering.png"
plt.show()
plt.savefig(plot_path)
print(f" Saved post-filter violin plot: {plot_path}")
plt.close(fig)
# Spatial plot showing filtered spots (requires spatial coords)
fig, ax = plt.subplots(figsize=(8, 7))
sc.pl.spatial(adata_sample, color=['pass_qc'], library_id=sample_id, title=f'Spots Passing QC - {sample_id}',
ax=ax, show=False, legend_loc='right margin', legend_fontsize=8, size=1.2) # Use original adata to show context
plot_path = qc_figures_dir / f"{sample_id}_qc_spatial_filter_map.png"
plt.subplots_adjust(right=0.8)
plt.tight_layout()
plt.show()
plt.savefig(plot_path)
print(f" Saved spatial filter map: {plot_path}")
plt.close(fig)
# --- <<< END QUALITY CONTROL >>> ---
# --- Save Processed Data ---
print(f" 5. Saving processed AnnData object for {sample_id}...")
output_file = output_adata_dir / f"QC_processed_{sample_id}.h5ad"
# Ensure the AnnData object is clean before saving (remove temporary columns if desired)
# del adata_filtered.obs['pass_qc'] # Optional cleanup
adata_filtered.write_h5ad(output_file)
print(f" Saved processed data to: {output_file}")
print(f" Final dimensions: {adata_filtered.n_obs} spots, {adata_filtered.n_vars} genes")
# Store the processed adata if needed for later steps in the same script
# adatas_processed[sample_id] = adata_filtered
print(f"--- Successfully processed and saved {sample_id} ---\n")
except Exception as e:
print(f" FATAL ERROR during processing for {sample_id}: {e}")
traceback.print_exc()
print(f" Skipping {sample_id} due to fatal error.\n")
continue
print("="*40)
print("All samples processed.")
Loading individual samples, applying QC, and saving...
--- Processing Autopsy1 ---
1. Loading Autopsy1 using scanpy.read_visium...
Initial load: 799 spots, 18085 genes
2. Making var_names unique...
3. Verifying spatial metadata for library_id 'Autopsy1'...
Metadata loaded and verified in .uns['spatial']['Autopsy1']: True
Coordinates loaded (.obsm['spatial']): True
4. Performing Quality Control for Autopsy1...
Calculating MT percentage using prefix 'MT-'...
Found 11 mitochondrial genes.
Generating pre-filtering QC plots...
Saved pre-filter violin plot: 101_Preprocess_qc_figures/Autopsy1_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Autopsy1_qc_scatter_before_filtering.png
Determining dynamic filtering thresholds...
Calculating threshold for n_genes_by_counts:
- Median: 1463.00, MAD: 538.00, Raw Lower Threshold (3.0 MAD): -151.00
- Final Lower Threshold (>= 50, >= 0): 50.00
Calculating threshold for total_counts:
- Median: 2200.00, MAD: 975.00, Raw Lower Threshold (3.0 MAD): -725.00
- Final Lower Threshold (>= 100, >= 0): 100.00
Calculating threshold for pct_counts_mt:
- Median: 5.47, MAD: 1.69, Raw Upper Threshold (3.0 MAD): 10.53
- Final Upper Threshold (<= 25.0, <= 100): 10.53
Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 10.526315689086914
Filtering removed 173 spots (21.65%). Kept 626 spots.
(Optional) Removed 11 mitochondrial genes.
Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved post-filter violin plot: 101_Preprocess_qc_figures/Autopsy1_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
Saved spatial filter map: 101_Preprocess_qc_figures/Autopsy1_qc_spatial_filter_map.png 5. Saving processed AnnData object for Autopsy1...
Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Autopsy1.h5ad
Final dimensions: 626 spots, 18074 genes
--- Successfully processed and saved Autopsy1 ---
--- Processing Autopsy2 ---
1. Loading Autopsy2 using scanpy.read_visium...
Initial load: 732 spots, 18085 genes
2. Making var_names unique...
3. Verifying spatial metadata for library_id 'Autopsy2'...
Metadata loaded and verified in .uns['spatial']['Autopsy2']: True
Coordinates loaded (.obsm['spatial']): True
4. Performing Quality Control for Autopsy2...
Calculating MT percentage using prefix 'MT-'...
Found 11 mitochondrial genes.
Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved pre-filter violin plot: 101_Preprocess_qc_figures/Autopsy2_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Autopsy2_qc_scatter_before_filtering.png
Determining dynamic filtering thresholds...
Calculating threshold for n_genes_by_counts:
- Median: 327.00, MAD: 124.00, Raw Lower Threshold (3.0 MAD): -45.00
- Final Lower Threshold (>= 50, >= 0): 50.00
Calculating threshold for total_counts:
- Median: 381.50, MAD: 154.00, Raw Lower Threshold (3.0 MAD): -80.50
- Final Lower Threshold (>= 100, >= 0): 100.00
Calculating threshold for pct_counts_mt:
- Median: 9.62, MAD: 3.03, Raw Upper Threshold (3.0 MAD): 18.71
- Final Upper Threshold (<= 25.0, <= 100): 18.71
Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 18.712793350219727
Filtering removed 37 spots (5.05%). Kept 695 spots.
(Optional) Removed 11 mitochondrial genes.
Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved post-filter violin plot: 101_Preprocess_qc_figures/Autopsy2_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
Saved spatial filter map: 101_Preprocess_qc_figures/Autopsy2_qc_spatial_filter_map.png 5. Saving processed AnnData object for Autopsy2...
Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Autopsy2.h5ad
Final dimensions: 695 spots, 18074 genes
--- Successfully processed and saved Autopsy2 ---
--- Processing Autopsy3 ---
1. Loading Autopsy3 using scanpy.read_visium...
Initial load: 1741 spots, 18085 genes
2. Making var_names unique...
3. Verifying spatial metadata for library_id 'Autopsy3'...
Metadata loaded and verified in .uns['spatial']['Autopsy3']: True
Coordinates loaded (.obsm['spatial']): True
4. Performing Quality Control for Autopsy3...
Calculating MT percentage using prefix 'MT-'...
Found 11 mitochondrial genes.
Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved pre-filter violin plot: 101_Preprocess_qc_figures/Autopsy3_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Autopsy3_qc_scatter_before_filtering.png
Determining dynamic filtering thresholds...
Calculating threshold for n_genes_by_counts:
- Median: 1085.00, MAD: 469.00, Raw Lower Threshold (3.0 MAD): -322.00
- Final Lower Threshold (>= 50, >= 0): 50.00
Calculating threshold for total_counts:
- Median: 1458.00, MAD: 699.00, Raw Lower Threshold (3.0 MAD): -639.00
- Final Lower Threshold (>= 100, >= 0): 100.00
Calculating threshold for pct_counts_mt:
- Median: 6.61, MAD: 2.51, Raw Upper Threshold (3.0 MAD): 14.14
- Final Upper Threshold (<= 25.0, <= 100): 14.14
Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 14.139665603637695
Filtering removed 412 spots (23.66%). Kept 1329 spots.
(Optional) Removed 11 mitochondrial genes.
Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved post-filter violin plot: 101_Preprocess_qc_figures/Autopsy3_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
Saved spatial filter map: 101_Preprocess_qc_figures/Autopsy3_qc_spatial_filter_map.png 5. Saving processed AnnData object for Autopsy3...
Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Autopsy3.h5ad
Final dimensions: 1329 spots, 18074 genes
--- Successfully processed and saved Autopsy3 ---
--- Processing Explanted1 ---
1. Loading Explanted1 using scanpy.read_visium...
Initial load: 999 spots, 18085 genes
2. Making var_names unique...
3. Verifying spatial metadata for library_id 'Explanted1'...
Metadata loaded and verified in .uns['spatial']['Explanted1']: True
Coordinates loaded (.obsm['spatial']): True
4. Performing Quality Control for Explanted1...
Calculating MT percentage using prefix 'MT-'...
Found 11 mitochondrial genes.
Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved pre-filter violin plot: 101_Preprocess_qc_figures/Explanted1_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Explanted1_qc_scatter_before_filtering.png
Determining dynamic filtering thresholds...
Calculating threshold for n_genes_by_counts:
- Median: 1327.00, MAD: 812.00, Raw Lower Threshold (3.0 MAD): -1109.00
- Final Lower Threshold (>= 50, >= 0): 50.00
Calculating threshold for total_counts:
- Median: 1839.00, MAD: 1257.00, Raw Lower Threshold (3.0 MAD): -1932.00
- Final Lower Threshold (>= 100, >= 0): 100.00
Calculating threshold for pct_counts_mt:
- Median: 2.77, MAD: 0.99, Raw Upper Threshold (3.0 MAD): 5.74
- Final Upper Threshold (<= 25.0, <= 100): 5.74
Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 5.740735054016113
Filtering removed 149 spots (14.91%). Kept 850 spots.
(Optional) Removed 11 mitochondrial genes.
Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved post-filter violin plot: 101_Preprocess_qc_figures/Explanted1_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
Saved spatial filter map: 101_Preprocess_qc_figures/Explanted1_qc_spatial_filter_map.png 5. Saving processed AnnData object for Explanted1...
Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Explanted1.h5ad
Final dimensions: 850 spots, 18074 genes
--- Successfully processed and saved Explanted1 ---
--- Processing Explanted2 ---
1. Loading Explanted2 using scanpy.read_visium...
Initial load: 1359 spots, 18085 genes
2. Making var_names unique...
3. Verifying spatial metadata for library_id 'Explanted2'...
Metadata loaded and verified in .uns['spatial']['Explanted2']: True
Coordinates loaded (.obsm['spatial']): True
4. Performing Quality Control for Explanted2...
Calculating MT percentage using prefix 'MT-'...
Found 11 mitochondrial genes.
Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved pre-filter violin plot: 101_Preprocess_qc_figures/Explanted2_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Explanted2_qc_scatter_before_filtering.png
Determining dynamic filtering thresholds...
Calculating threshold for n_genes_by_counts:
- Median: 919.00, MAD: 447.00, Raw Lower Threshold (3.0 MAD): -422.00
- Final Lower Threshold (>= 50, >= 0): 50.00
Calculating threshold for total_counts:
- Median: 1194.00, MAD: 664.00, Raw Lower Threshold (3.0 MAD): -798.00
- Final Lower Threshold (>= 100, >= 0): 100.00
Calculating threshold for pct_counts_mt:
- Median: 1.99, MAD: 0.72, Raw Upper Threshold (3.0 MAD): 4.15
- Final Upper Threshold (<= 25.0, <= 100): 4.15
Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 4.149576663970947
Filtering removed 201 spots (14.79%). Kept 1158 spots.
(Optional) Removed 11 mitochondrial genes.
Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved post-filter violin plot: 101_Preprocess_qc_figures/Explanted2_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
Saved spatial filter map: 101_Preprocess_qc_figures/Explanted2_qc_spatial_filter_map.png 5. Saving processed AnnData object for Explanted2...
Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Explanted2.h5ad
Final dimensions: 1158 spots, 18074 genes
--- Successfully processed and saved Explanted2 ---
--- Processing Explanted3 ---
1. Loading Explanted3 using scanpy.read_visium...
Initial load: 952 spots, 18085 genes
2. Making var_names unique...
3. Verifying spatial metadata for library_id 'Explanted3'...
Metadata loaded and verified in .uns['spatial']['Explanted3']: True
Coordinates loaded (.obsm['spatial']): True
4. Performing Quality Control for Explanted3...
Calculating MT percentage using prefix 'MT-'...
Found 11 mitochondrial genes.
Generating pre-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved pre-filter violin plot: 101_Preprocess_qc_figures/Explanted3_qc_violin_before_filtering.png
<Figure size 640x480 with 0 Axes>
Saved pre-filter scatter plot: 101_Preprocess_qc_figures/Explanted3_qc_scatter_before_filtering.png
Determining dynamic filtering thresholds...
Calculating threshold for n_genes_by_counts:
- Median: 1871.00, MAD: 1106.00, Raw Lower Threshold (3.0 MAD): -1447.00
- Final Lower Threshold (>= 50, >= 0): 50.00
Calculating threshold for total_counts:
- Median: 2805.50, MAD: 1887.00, Raw Lower Threshold (3.0 MAD): -2855.50
- Final Lower Threshold (>= 100, >= 0): 100.00
Calculating threshold for pct_counts_mt:
- Median: 3.02, MAD: 0.71, Raw Upper Threshold (3.0 MAD): 5.16
- Final Upper Threshold (<= 25.0, <= 100): 5.16
Applying filters: n_genes_by_counts >= 50, total_counts >= 100, pct_counts_mt <= 5.161240577697754
Filtering removed 59 spots (6.20%). Kept 893 spots.
(Optional) Removed 11 mitochondrial genes.
Generating post-filtering QC plots...
<Figure size 640x480 with 0 Axes>
Saved post-filter violin plot: 101_Preprocess_qc_figures/Explanted3_qc_violin_after_filtering.png
<Figure size 640x480 with 0 Axes>
Saved spatial filter map: 101_Preprocess_qc_figures/Explanted3_qc_spatial_filter_map.png 5. Saving processed AnnData object for Explanted3...
Saved processed data to: 101_Preprocess_processed_adata/QC_processed_Explanted3.h5ad
Final dimensions: 893 spots, 18074 genes
--- Successfully processed and saved Explanted3 ---
========================================
All samples processed.
<Figure size 640x480 with 0 Axes>
3. Conclusion¶
All specified samples have been processed.
- QC-filtered AnnData objects are saved as
.h5adfiles in the directory:./101_Preprocess_processed_adata - QC plots (pre- and post-filtering violins, scatter plots, and spatial filter maps) are saved as
.pngfiles in the directory:./101_Preprocess_qc_figures
These processed files can now be used for downstream analyses such as normalization, clustering, differential expression, and further spatial analysis.
In [5]:
print("\nProcessed AnnData files created:")
for f in sorted(output_adata_dir.glob('*.h5ad')):
print(f'- {f.name}')
print("\nQC Figure files created:")
for f in sorted(qc_figures_dir.glob('*.png')):
print(f'- {f.name}')
Processed AnnData files created: - QC_processed_Autopsy1.h5ad - QC_processed_Autopsy2.h5ad - QC_processed_Autopsy3.h5ad - QC_processed_Explanted1.h5ad - QC_processed_Explanted2.h5ad - QC_processed_Explanted3.h5ad QC Figure files created: - Autopsy1_qc_scatter_before_filtering.png - Autopsy1_qc_spatial_filter_map.png - Autopsy1_qc_violin_after_filtering.png - Autopsy1_qc_violin_before_filtering.png - Autopsy2_qc_scatter_before_filtering.png - Autopsy2_qc_spatial_filter_map.png - Autopsy2_qc_violin_after_filtering.png - Autopsy2_qc_violin_before_filtering.png - Autopsy3_qc_scatter_before_filtering.png - Autopsy3_qc_spatial_filter_map.png - Autopsy3_qc_violin_after_filtering.png - Autopsy3_qc_violin_before_filtering.png - Explanted1_qc_scatter_before_filtering.png - Explanted1_qc_spatial_filter_map.png - Explanted1_qc_violin_after_filtering.png - Explanted1_qc_violin_before_filtering.png - Explanted2_qc_scatter_before_filtering.png - Explanted2_qc_spatial_filter_map.png - Explanted2_qc_violin_after_filtering.png - Explanted2_qc_violin_before_filtering.png - Explanted3_qc_scatter_before_filtering.png - Explanted3_qc_spatial_filter_map.png - Explanted3_qc_violin_after_filtering.png - Explanted3_qc_violin_before_filtering.png